Interpretability Framework for Differentially Private Deep Learning

ABSTRACT

Data is received that specifies a bound for an adversarial posterior belief ρ c  that corresponds to a likelihood to re-identify data points from the dataset based on a differentially private function output. Privacy parameters ε, δ are then calculated based on the received data that govern a differential privacy (DP) algorithm to be applied to a function to be evaluated over a dataset. The calculating is based on a ratio of probabilities distributions of different observations, which are bound by the posterior belief ρ c  as applied to a dataset. The calculated privacy parameters are then used to apply the DP algorithm to the function over the dataset. Related apparatus, systems, techniques and articles are also described.

TECHNICAL FIELD

The subject matter described herein relates to an interpretabilityframework for calculating confidence levels and expected membershipadvantages of an adversary in identifying members of a training datasetused with training machine learning models.

BACKGROUND

Machine learning models can leak sensitive information about trainingdata. To address such situations, noise can be added during the trainingprocess via differential privacy (DP) to mitigate privacy risk. To applydifferential privacy, data scientists choose DP parameters (ϵ, δ).However, interpreting and choosing DP privacy parameters (ϵ, δ), andcommunicating the factual guarantees with regard to re-identificationrisk and plausible deniability is still a cumbersome task fornon-experts. Different approaches for justification and interpretationof DP privacy parameters have been introduced which stray from theoriginal DP definition by offering an upper bound on privacy in face ofan adversary with arbitrary auxiliary knowledge.

SUMMARY

In a first aspect, data is received that specifies a bound for anadversarial posterior belief ρ_(c) that corresponds to a likelihood tore-identify data points from the dataset based on a differentiallyprivate function output. Privacy parameters ε, δ are then calculatedbased on the received data that govern a differential privacy (DP)algorithm to be ρ_(c) on a conditional probability of different possibledatasets on a ratio of probabilities distributions of differentobservations, which are bound by the posterior belief ρ_(c) as appliedto a dataset. The calculated privacy parameters are then used to applythe DP algorithm to the function over the dataset.

The probability distributions can be generated using a Gaussianmechanism with an (ε, δ) guarantee that perturbs the result of thefunction evaluated over the dataset, preventing a posterior beliefgreater than ρ_(c) on the dataset.

The probability distributions can be generated using a Laplacianmechanism with an ε guarantee that perturbs the result of the functionevaluated over the dataset, preventing a posterior belief greater thanρ_(c) on the dataset.

The resulting dataset (i.e., the dataset after application of the DPalgorithm to the function over the dataset) can be used for variousapplications including training a machine learning model. Such a trainedmachine learning model can be deployed and then classify data inputtherein.

Privacy parameter ϵ can equal log(ρ_(c)/(1−ρ_(c)) for a series of (ε, δ)or ε anonymized function evaluations with multidimensional data.

A resulting total posterior belief ρ_(c) can be calculated using asequential composition or Rényi differential privacy (RDP) composition.The at least one machine learning model can be updated using thecalculated resulting total posterior belief ρ_(c).

In an interrelated aspect, data is received that specifies privacyparameters ε, δ which govern a differential privacy (DP) algorithm to beapplied to a function to be evaluated over a dataset. The received datais then used to calculate an

can be used when applying the DP algorithm to a function over thedataset.

The probability distributions can be generated using a Gaussianmechanism with an (ε, δ) guarantee that perturbs the result of thefunction evaluated over the dataset, ensuring that membership advantageis ρ_(α) on the dataset.

The probability distributions can be generated using a Laplacianmechanism with an ε guarantee that perturbs the result of the functionevaluated over the dataset, ensuring that membership advantage is ρ_(α)on the dataset.

The resulting dataset (i.e., the dataset after application of the DPalgorithm to the function over the dataset) can be used to train atleast one machine learning. Such a trained machine learning model can bedeployed so as to classify further data input therein.

The calculated expected membership advantage ρ_(α) for a series of (ε,δ) anonymized function evaluations with multidimensional data is equalto:

${{CDF}\left( \frac{1}{\frac{2\sqrt{2{\ln\left( \frac{1.25}{\delta} \right)}}}{\epsilon}} \right)} - {{CDF}\left( \frac{- 1}{\frac{2\sqrt{2{\ln\left( \frac{1.25}{\delta} \right)}}}{\epsilon}} \right)}$

wherein CDF is the cumulative distribution function of the standardnormal distribution.

A resulting expected membership advantage ρ_(α) can be calculated usingsequential composition or Rényi differential privacy (RDP) composition.The calculated resulting expected membership advantage ρ_(α) can be usedto update the at least one machine learning model.

In a further interrelated aspect, data is received that specify privacyparameters ε, δ which govern a differential privacy (DP) algorithm to beapplied to a function to be evaluated over a dataset. Thereafter, thereceived data is used to calculate an adversarial posterior belief boundρ_(c) that corresponds to a likelihood to re-identify data points fromthe dataset based on a differentially private function output. Suchcalculating can be based on an overlap of two probability distributions.The DP algorithm can then be applied, using the calculated adversarialposterior belief bound ρ_(c), to a function over the dataset to resultin an anonymized function output (e.g., machine learning model, etc.).

Posterior belief bound ρ_(c) can equal 1/(1+e^(−ϵ)) for a series of (ε,δ) or ε anonymized function evaluations with multidimensional data.

Data can be received that specifies an expected adversarial posteriorbelief bound expected ρ_(c) such that ρ_(c)=expected ρ_(c)+δ*(1−expectedρ_(c)).

The probability distributions can be generated using a differentialprivacy mechanism either with an (ε, δ) guarantee or with an ε guaranteethat perturbs the result of the function evaluated over the dataset,preventing a posterior belief greater than ρ_(c) on the dataset.

At least one machine learning model can be anonymously trained using theresulting dataset (i.e., the dataset after application of the DPalgorithm to the function over the dataset). A resulting total posteriorbelief ρ_(c) can be calculated using a sequential composition or Rényidifferential privacy (RDP) composition. The at least one machinelearning model can be updated using the calculated resulting totalposterior belief ρ_(c).

In a still further interrelated aspect, a dataset is received.Thereafter, at least one first user-generated privacy parameter isreceived which governs a differential privacy (DP) algorithm to beapplied to a function evaluated over the received dataset. Using thereceived at least one first user-generated privacy parameter, at leastone second privacy parameter is calculated based on a ratio or overlapof probabilities of distributions of different observations. Thereafter,the DP algorithm is applied, using the at least one second privacyparameter, to the function over the received dataset to result in ananonymized function output (e.g., machine learning model, etc.). Atleast one machine learning model can be anonymously trained using thedataset which, when deployed, is configured to classify input data.

The machine learning model(s) can be deployed once trained to classifyinput data when received.

The at least one first user-generated privacy parameter can include abound for an adversarial posterior belief ρ_(c) that corresponds to alikelihood to re-identify data points from the dataset based on adifferentially private function output. With such an arrangement, thecalculated at least one second privacy parameter can include privacyparameters ε, δ and the calculating can be based on a ratio ofprobabilities distributions of different observations which are bound bythe posterior belief ρ_(c) as applied to the dataset.

In another variation, the at least one first user-generated privacyparameter includes privacy parameters ε, δ. With such an implementation,the calculated at least one second privacy parameter can include anexpected membership advantage ρ_(α) that corresponds to a probability ofan adversary successfully identifying a member in the dataset and thecalculating can be based on an overlap of two probability distributions.

In still another variation, the at least one first user-generatedprivacy parameter can include privacy parameters ε, δ. With such animplementation, the

that corresponds to a likelihood to re-identify data points from thedataset based on a differentially private function output and thecalculating can be based on an overlap of two probability distributions.

Non-transitory computer program products (i.e., physically embodiedcomputer program products) are also described that store instructions,which when executed by one or more data processors of one or morecomputing systems, cause at least one data processor to performoperations herein. Similarly, computer systems are also described thatmay include one or more data processors and memory coupled to the one ormore data processors. The memory may temporarily or permanently storeinstructions that cause at least one processor to perform one or more ofthe operations described herein. In addition, methods can be implementedby one or more data processors either within a single computing systemor distributed among two or more computing systems. Such computingsystems can be connected and can exchange data and/or commands or otherinstructions or the like via one or more connections, including but notlimited to a connection over a network (e.g., the Internet, a wirelesswide area network, a local area network, a wide area network, a wirednetwork, or the like), via a direct connection between one or more ofthe multiple computing systems, etc.

The subject matter described herein provides many technical advantages.For example, the current framework provides enhanced techniques forselecting a privacy parameter ϵ based on the re-identificationconfidence ρ_(c) and expected membership advantage ρ_(α). Theseadvantages were demonstrated on synthetic data, reference data andreal-world data in a machine learning and data analytics use case whichshow that the current framework is suited for multidimensional queriesunder composition. The current framework furthermore allows theoptimization of the utility of differentially private queries at thesame (ρ_(c), ρ_(α)) by considering the sensitive range S(ƒ) instead ofglobal sensitivity Δƒ. The framework allows data owners and datascientists to map their expectations of utility and privacy, and derivethe consequent privacy parameters ϵ.

The details of one or more variations of the subject matter describedherein are set forth in the accompanying drawings and the descriptionbelow. Other features and advantages of the subject matter describedherein will be apparent from the description and drawings, and from theclaims.

DESCRIPTION OF DRAWINGS

FIGS. 1A and 1B are diagrams respectively illustrating posterior beliefsover an output space of

_(Gαu) and

_(Lαp);

FIGS. 2A and 2B are diagrams respectively illustrating decisionboundaries based on PDFs and confidence;

FIGS. 3A and 3B are diagrams illustrating

adapt error regions for varying ε,

_(Gαu), f (

)=0, f (

)=1;

FIGS. 4A and 4B are diagrams respectively illustrating expectedadversarial worst-case confidence bound ρ_(c) and the adversarialmembership advantage ρ_(α) for various (ϵ, δ) when using

_(Gau) for perturbation;

FIGS. 5A and 5B are diagrams illustrating a sample run of

_(adapt) on a sequence of k=100 evaluations of

_(Gαu) _(i) that shows the mechanism outputs on the left y-axis and thedevelopment of confidences on

and

on the right-hand y-axis. At the

_(adapt) decides for the dataset with the highest confidence. In 5(b),the number of decisions for overall runs are shown.

FIGS. 6A-6D are diagrams illustrating a confidence distribution of

adapt at the end of 10,000 runs, i.e., after composition over differentε and fixed δ=0.001.

FIGS. 7A-7D are diagrams illustrating a confidence distribution of

adapt at the end of 30 epochs, i.e., after composition with δ=0.001;these diagrams show the distribution for global sensitivity, whichyields strong privacy and little Δf₂=S(f), which yields a distributionidentical to its counterpart using synthetic data.

FIG. 8 is a diagram illustrating confidence distribution after 30 epochswith privacy parameters ρ_(c)=0.9, δ=0.01;

FIG. 9A-9B is a diagram illustrating sensitivity and test accuracy over30 epochs;

FIG. 10A-10C are diagrams illustrating utility and privacy metrics forthe GEFcom challenge;

FIG. 11 is a first process flow diagram illustrating an interpretabilityframework for differential private deep learning;

FIG. 12 is a second process flow diagram illustrating aninterpretability framework for differential private deep learning;

FIG. 13 is a third process flow diagram illustrating an interpretabilityframework for differential private deep learning;

FIG. 14 is a fourth process flow diagram illustrating aninterpretability framework for differential private deep learning; and

FIG. 15 is a diagram of a computing device for implementing aspects ofan interpretability framework for differential private deep learning.

Like reference symbols in the various drawings indicate like elements.

DETAILED DESCRIPTION

Provided herein is an interpretability framework for calculating theconfidence ρ_(c) and expected membership advantage ρ_(α) of an adversaryin identifying members of training data used in connection with one ormachine learning models. These metrics are derived a prióri formultidimensional, iterative computations, as found in machine learning.The framework is compatible with composition theorems and alternativedifferential privacy definitions like Rényi Differential Privacy,offering a tight upper bound on privacy. For illustration purposes, theframework and resulting utility is evaluated on synthetic data, in adeep learning reference task, and in a real-world electric loadforecasting benchmark.

The current subject matter provides a generally applicable framework forinterpretation of the DP guarantee in terms of an adversary's confidenceand expected membership advantage for identifying the dataset on which adifferentially private result was computed. The framework adapts tovarious DP mechanisms (e.g., Laplace, Gaussian, Exponential) for scalarand multidimensional outputs and is well-defined even under composition.The framework allows users to empirically analyze a worst-case adversaryunder DP, but also gives analytical bounds with regard to maximumconfidence and expected membership advantage.

The current subject matter, in particular, can be used to generateanonymized function output within specific privacy parameter boundswhich govern the difficulty of getting insight into the underlying inputdata. Such anonymous function evaluations can be used for variouspurposes including training of machine learning models which, whendeployed, can classify future data input into such models.

Also provided herein, are illustrations of how different privacy regimescan be determined by the framework independent of a specific use case.

Still further, with the current subject matter, privacy parameters forabstract composition theorems such as Rényi Differential Privacy in deeplearning can be inferred from the desired confidence and membershipadvantage in our framework.

Differential Privacy.

Generally, data analysis can be defined as the evaluation of a functionƒ: DOM→R on some dataset

∈DOM yielding a result r∈R. Differential privacy is a mathematicaldefinition for anonymized analysis of datasets. In contrast to previousanonymization methods based on generalization (e.g., k-anonymity), DPperturbs the result of a function ƒ(⋅) over a dataset

={d₁, . . . , d_(n)} s.t. it is no longer possible to confidentlydetermine whether ƒ(⋅) was evaluated on

or some neighboring dataset

′ differing in one individual. The neighboring dataset

′ can be created by either removing one data point from

(unbounded DP) or by replacing one data point in

with another from DOM (bounded DP). Thus, privacy is provided toparticipants in the dataset since their impact of presence (absence) onthe query result becomes negligible. To inject differentially privatenoise into the result of some arbitrary function ƒ(⋅), mechanisms

fulfilling Definition 1 are utilized.

Definition 1 ((ϵ, δ)-Differential Privacy)

A mechanism

gives (ϵ, δ)-Differential Privacy if for all

⊆DOM differing in at most one element, and all outputs S⊆R

Pr(

)∈S)≤e ^(−ϵ) ·Pr(

)∈S)+δ

ϵ-DP is defined as (ϵ, δ=0)-DP and refer to the application of amechanism

to a function ƒ(⋅) as output perturbation. DP holds if mechanisms arecalibrated to the global sensitivity i.e., the largest influence amember of the dataset can cause to the outcome of any θ(⋅). Let

and

′ be neighboring datasets, the global

₁-sensitivity of a function ƒ is defined as Δƒ=

∥ƒ(

)−ƒ(

′)∥₁. Similarly, Δƒ₂=

∥ƒ(

)−ƒ(

′)∥₂ can be referred to as global

₂-sensitivity.

A popular mechanism for perturbing the outcome of numerical queryfunctions ƒ is the Laplace mechanism. Following Definition 1 the Laplacemechanism adds noise calibrated to Δƒ by drawing noise from the Laplacedistribution with mean μ=0.

Theorem 1 (Laplace Mechanism).

Given a numerical query functions ƒ:DOM→

^(k), the Laplace mechanism

_(Lap)(

,ƒ,ϵ):=ƒ(D)+(z ₁ , . . . ,z _(k))

is an ϵ-differentially private mechanism when all z_(i) with 1≤i≤k areindependently drawn from

$\mspace{20mu}{{\text{\textasciitilde}{{Lap}\left( {z,{\lambda = {\frac{\Delta\; f}{\text{?}}\mu}}} \right)}} = 0.}$?indicates text missing or illegible when filed

A second DP mechanism used for output perturbation within this work isthe Gaussian mechanism of Definition 2. The Gaussian mechanism uses

₂-sensitivity.

Theorem 2 (Gaussian Mechanism).

Given a numerical query function ƒ:DOM→

^(k), it exists σ s. t. the Gaussian mechanism

_(Gau)(

,ƒ,ϵ,δ):=ƒ(

)+(z ₁ , . . . z _(k))

is an (ϵ, δ)-differentially private mechanism for a given pair of ϵ,δ∈(0, 1) when all z_(i) with 1≤i≤k are independently drawn from

˜N (0, σ²).

Prior work has analyzed the tails of the normal distributions and foundthat bounding σ>Δƒ₂√{square root over (2 ln(1.25/δ))}/ϵ fulfills Theorem2. However, these bounds have been shown to be loose and result inoverly pessimistic noise addition.

Definition 2 ((α, ϵ_(RDP))-Differential Privacy)

A mechanism

gives (α, ϵ_(RDP))-RDP if for any adjacent

⊆DOM and α>1

$\left. {{D_{\alpha}\left( {\mathcal{M}(\mathcal{D})} \right.}{\mathcal{M}\left( \mathcal{D}^{\prime} \right)}} \right) = {{\frac{1}{\alpha - 1}\ln\;{{\mathbb{E}}_{x\sim{\mathcal{M}{(\mathcal{D}^{\prime})}}}\left( \frac{\mathcal{M}(\mathcal{D})}{\mathcal{M}\left( \mathcal{D}^{\prime} \right)} \right)}^{\alpha}} \leq \epsilon_{RDP}}$

Calibrating the Gaussian mechanism in terms of Rényi differentialprivacy (RDP) is straight forward due to the relation ϵ_(RDP)=α·Δƒ₂²/2σ². One option is to split σ=Δƒ₂*η where η is called noise multiplierwhich is the actual term dependent on ϵ_(RDP) as Δβ₂ is fixed. A (α,RDP)-RDP guarantee converts to

$\left( {{\epsilon_{RDP} - \frac{\ln\;\delta}{\alpha - 1}},\delta} \right) - {DP}$

which is not trivially invertible as multiple (α, ϵ_(RDP)) yield thesame (ϵ, δ)-DP guarantee. A natural choice is to search for a (α,ϵ_(RDP)) causing a to be as low as possible. Hence, it can be expandedas follows:

$\epsilon = {{\epsilon_{RDP} - \frac{\ln\;\delta}{\alpha - 1}} = {{{{\alpha \cdot \Delta}\;{f_{2}^{2}/2}\sigma^{2}} - \frac{\ln\;\delta}{\alpha - 1}} = {{{\alpha/2}\eta^{2}} - \frac{\ln\;\delta}{\alpha - 1}}}}$

and minimize

$\eta = {\min\limits_{\alpha}\sqrt{{\alpha/2}\left( {\epsilon + {\left( {\ln\;\delta} \right)/\left( {\alpha - 1} \right)}} \right)}}$

which provides a tight bound on η and thus on σ for given (ϵ, δ).

The standard approach to analyze the privacy decay over a series of ϵ-DPmechanisms is the sequential composition theorem.

Theorem 3 (Sequential Composition).

Let

_(i) provide (ϵ_(i),δ_(i))-Differential Privacy. The sequence of

_(1, . . . ,k)(

) provides (Σ_(i) ϵ_(i), Σ_(i) δ_(i))-DP.

Sequential composition is, again, loose for (δ, δ)-DP which has resultedin various advanced theorems for composition. Yet, tight compositionbounds are also studied in the RDP domain which has the nice property ofϵ_(RDP,i) being summed up as well. So, for the sequence of k mechanismsexecutions each providing (α, ϵ_(RDP,i))-RDP the total guaranteecomposes to (α, Σ_(i) ϵ_(RDP,i))-RDP. Using the equations above, a tightper step η can be derived from this.

These aspects of DP build the foundations of private deep learning. Inprivate deep learning, the tradeoff between privacy and utility becomesimportant because practical neural networks need to offer a certainaccuracy. Although increasing privacy through the (ϵ, δ) guaranteealways decreases utility, other factors also affect the accuracy ofmodels, such as the quantity of training data and the value chosen forthe clipping norm

.

Various properties of

affect its optimal value. Unfortunately, Δƒ₂ cannot be determined inadvance for the size of gradients, so it has been proposed to clip eachper example gradient to

, bounding the influence of one example on an update. This parameter canbe set to maximize model accuracy and offer the rule to set

to “the median of the norms of the unclipped gradients over the courseof training.” The following effects can be taken into account: theclipped gradient may point in a different direction from the originalgradient if

is too small, but if

is too large, the large magnitude of noise added decreases utility.Since gradients change over the course of training, the optimal value of

at the beginning of training may no longer be optimal toward the end oftraining. Adaptively setting the clipping norm may further improveutility by changing

as training progresses or setting

differently for each layer. To improve utility for a set privacyguarantee, the value of

can be tuned and adapted.

TABLE 1

_(prob) notations Symbol Description

Data universe

, i.e., a set of individuals that are possibly present in the originaldataset

.

 ⊂

 denotes the subset of records of whom

 _(prob) knows they have participated in  

. n |D|.

Mechanism

 and parameters, e.g. (ϵ, δ, f) for  

_(Lap) f Data analysis function f(•). r Differentially private output ryielded by  

. p_(ω)(*) probability density function of  

  given world ω.

Strong Probabilistic Adversary.

For interpretation of (ϵ, δ) the privacy guarantee ϵ with regard to adesired bound on the Bayesian belief of a probabilistic adversary

_(prob).

_(prob)'s knowledge is modeled as the tuple (

, n,

, ƒ, r) which is defined in Table 1.

_(prob) seeks to identify

by evaluating possible combinations of missing individuals drawn from

, which can be formally denoted as possible worlds:

Ψ={

∪{d ₁ , . . . ,d _(n) }|d ₁ , . . . ,d _(n)∈

\

}

_(prob) assigns a probability to each world ω∈P, reflecting theconfidence that w was used as input to

. This confidence can be referred to as belief β(ω). The posteriorbelief of

_(prob) on world ω_(i) is defined as conditional probability:

$\begin{matrix}\begin{matrix}{{\beta\left( \omega_{i} \right)} = {\Pr\left( {{\omega_{i}❘{\mathcal{M}( \cdot )}} = r} \right)}} \\{= \frac{{\Pr\left( {{\mathcal{M}( \cdot )} = {r❘\omega_{i}}} \right)} \cdot {\Pr\left( \omega_{i} \right)}}{\Pr\left( {{\mathcal{M}( \cdot )} = r} \right)}} \\{= \frac{{\Pr\left( {{\mathcal{M}( \cdot )} = {r❘\omega_{i}}} \right)} \cdot {\Pr\left( \omega_{i} \right)}}{\sum_{j}{{\Pr\left( {{\mathcal{M}( \cdot )} = {r❘\omega_{j}}} \right)} \cdot {\Pr\left( \omega_{j} \right)}}}} \\{= \frac{{\Pr\left( {{\mathcal{M}\left( \omega_{i} \right)} = r} \right)} \cdot {\Pr\left( \omega_{i} \right)}}{\sum_{j}{{\Pr\left( {{\mathcal{M}\left( \omega_{j} \right)} = r} \right)} \cdot {\Pr\left( \omega_{j} \right)}}}}\end{matrix} & (1) \\{= \frac{{p_{\omega_{i}}(r)} \cdot {\Pr\left( \omega_{i} \right)}}{\sum_{j}{{p_{\omega_{j}}(r)} \cdot {\Pr\left( \omega_{j} \right)}}}} & (2)\end{matrix}$

Using the fact that

represents a continuous random variable and the choice of worlds isdiscrete, Bayes theorem allows inserting

's probability density function (PDF) in step 2. The firmest guess of

_(prob) is represented by the world co having the highest correspondingbelief. However, it is not guaranteed that co represents the true world

. From this point, the terms confidence and posterior belief are usedinterchangeably.

The initial distribution over w reflects the prior belief on each worldby

_(prob). It is assumed that this is a discrete uniform distributionamong worlds, thus

${\Pr(\omega)} = {\frac{1}{\Psi }.{\forall{\omega \in {\Psi.}}}}$

By bounding the belief β({tilde over (ω)}) for the true world co by achosen constant ρ, a desired level of privacy can be guaranteed. It isnoted that bounding the belief for the true world implicitly also boundsthe belief for any other world.

The noise added to hide ω can be scaled to the sensitivity of the resultto a change in {tilde over (ω)}. Instead of basing this value on globalsensitivity, the largest possible contribution of any individual can bequantified as the sensitive range S(ƒ).

Definition 3 (Sensitive Range S(ƒ))

The sensitive range of a query function ƒ is the range of ƒ:

S(ƒ)=max_(ω) ₁ _(,ω) ₂ _(∈)Ψ∥ƒ(ω₁)−ƒ(ω₂)∥

This approach resulted in the introduction of differentialidentifiability which is defined below in Definition 4.

Definition 4 (Differential Identifiability)

Given a dataset D, a randomized mechanism

satisfies ρ-Differential Identifiability if among all possible datasets

₁,

₂, . . . ,

_(m) differing in one individual w.r.t.

the posterior belief β, after getting the response r∈R, is bounded by ρ:

β(

_(i)|

)=r)≤ρ.  (3)

The notation of possible world ω∈Ψ is replaces by possible datasetswhich is semantically the same. ρ-Differential Identifiability impliesthat after receiving a mechanism's output r the true dataset

can be identified by

_(prob) with confidence β(

)≤ρ.

DP and Differential Identifiability have been shown to be equal when|Ψ|=2, since DP considers two neighboring datasets

,

′ by definition. Specifically, Differential Identifiability is equal tobounded DP in this case, since possible worlds each have the same numberof records. Under this assumption, the sensitive range S(ƒ) represents aspecial case of local sensitivity in which both

and

are fixed. It can be assumed that Δƒ is equal to S(ƒ). If this conditionis met, the relation ρ↔∈ for

_(Lap) is:

$\begin{matrix}{\frac{\mathcal{S}(f)}{\lambda} = {\left. {\epsilon - {\ln\left( \frac{\rho}{1 - \rho} \right)}}\leftrightarrow\rho \right. = {\frac{1}{1 + e^{- \frac{\mathcal{S}{(f)}}{\lambda}}} = {\frac{1}{1 + e^{- \epsilon}} > {\frac{1}{2}.}}}}} & (4)\end{matrix}$

Framework for Interpreting DP.

Based on the original strong probabilistic adversary

_(prob) provided above, an interpretability framework is formulated thatallows to translate formal (ϵ, δ) guarantees into concretere-identification probabilities. First, the original confidence upperbound of Equation (3) can be extended to work with arbitrary DPmechanisms and a discussion is provided with regard to how δ isintegrated into the confidence bound. Second,

_(prob) is extended to behave adaptively with regard to a sequence ofmechanisms. It is shown below that that the resulting adaptive adversary

_(adapt) behaves as assumed by composition theorems. Third, expectedmembership advantage ρ_(α) is defined and suggested as a privacy measurecomplementing ρ, which we will refer to as ρ_(c) in the following.

General Adversarial Confidence Bound.

According to Equation (4) the probabilistic adversary

with unbiased priors (i.e., 0.5) regarding neighboring datasets

,

has a maximum posterior belief of 1/(1+e^(−ϵ)) when the ϵ-differentiallyprivate Laplace mechanism (cf. Definition 1) is applied to ƒ having ascalar output. In the following, it is shown that this upper bound holdsalso for arbitrary ϵ-differentially private mechanisms withmultidimensional output. Therefore, the general belief calculation ofEquation (1) can be bound by the inequality of Definition 1.

$\begin{matrix}{{\beta(\mathcal{D})} = \frac{\Pr\left( {{\mathcal{M}(\mathcal{D})} = r} \right)}{{\Pr\left( {{\mathcal{M}(\mathcal{D})} = r} \right)} + {\Pr\left( {{\mathcal{M}\left( \mathcal{D}^{\prime} \right)} = r} \right)}}} \\{\leq \frac{{{\Pr\left( {{\mathcal{M}\left( \mathcal{D}^{\prime} \right)} = r} \right)} \cdot e^{\epsilon}} + \delta}{{{\Pr\left( {{\mathcal{M}\left( \mathcal{D}^{\prime} \right)} = r} \right)} \cdot e^{\epsilon}} + \delta + {\Pr\left( {{\mathcal{M}\left( \mathcal{D}^{\prime} \right)} = r} \right)}}} \\{= \frac{1}{1 + \frac{\Pr\left( {{\mathcal{M}\left( \mathcal{D}^{\prime} \right)} = r} \right)}{{{\Pr\left( {{\mathcal{M}\left( \mathcal{D}^{\prime} \right)} = r} \right)} \cdot e^{\epsilon}} + \delta}}}\end{matrix}$

For δ=0, the last equation simplifies to 1/(1+e^(−ϵ)) so it can beconcluded:

Corollary 1.

For any ϵ-differentially private mechanism, the strong probabilisticadversary's confidence on either dataset

,

′ is bounded by

${\rho(\epsilon)} = \frac{1}{1 + e^{- \epsilon}}$

For δ>0, however, it was observed that where Pr(

′)=r) becomes very small, β(

) grows towards 1:

$\begin{matrix}{{\lim\limits_{{\Pr{({\mathcal{M}{(\mathcal{D}^{\prime})}})}}\rightarrow 0}\frac{1}{1 + \frac{\Pr\left( {{\mathcal{M}\left( \mathcal{D}^{\prime} \right)} = r} \right)}{{{\Pr\left( {{\mathcal{M}\left( \mathcal{D}^{\prime} \right)} = r} \right)} \cdot e^{\epsilon}} + \delta}}} = 1} & (5)\end{matrix}$

Hence, if the Gaussian mechanism

_(Gαu) samples a value at the tails of the distribution in the directionaway from ƒ(

′), the posterior belief for

and

′ head to 1 and 0, respectively. If a value is sampled from the tails inthe direction of ƒ(

′), the posterior belief for

and

′ go to 0 and 1, respectively. The difference in behavior between theLaplace and Gaussian mechanism when large values of noise are sampled isdemonstrated. Fixes ƒ(

)=0, ƒ(

′)=1 and Δƒ=Δƒ₂=1 can be utilized. In diagram 100 b of FIG. 1(b), theeffect of the output of

_(Lap) on the posterior beliefs for

and

′ when ϵ=1, δ

_(Gαu) results in an upper bound of 1, as is visualized in diagram 100 aof FIG. 1(a). Therefore, β(

) can only be bound for 1−δ

_(Gαu)

_(Gαu) provides δ-

P with probability 1−δ.

β is now extended to k-dimensional (ϵ, δ)-differentially privatemechanisms where ƒ(

)→

^(k).

Theorem 4.

The general confidence bound of Corollary 1 holds for multidimensional(ϵ, δ)-differentially private mechanisms with probability 1−δ.

Proof.

Properties of RDP can be used to prove the confidence bound formultidimensional (ϵ, δ)-differentially private mechanisms.

$\begin{matrix}\begin{matrix}{{\beta(\mathcal{D})} = \frac{\Pr\left( {{\mathcal{M}(\mathcal{D})} = r} \right)}{{\Pr\left( {{\mathcal{M}(\mathcal{D})} = r} \right)} + {\Pr\left( {{\mathcal{M}\left( \mathcal{D}^{\prime} \right)} = r} \right)}}} \\{= \frac{1}{1 + {{\Pr\left( {{\mathcal{M}\left( \mathcal{D}^{\prime} \right)} = r} \right)}/{\Pr\left( {{\mathcal{M}(\mathcal{D})} = r} \right)}}}}\end{matrix} & (6) \\{\leq \frac{1}{1 + \frac{\Pr\left( {{\mathcal{M}\left( \mathcal{D}^{\prime} \right)} = r} \right)}{\left( {e^{\epsilon_{RDP}} \cdot {\Pr\left( {{\mathcal{M}\left( \mathcal{D}^{\prime} \right)} = r} \right)}} \right)^{1 - {1/\alpha}}}}} & (7) \\{= \frac{1}{1 + {e^{- {\epsilon_{RDP}{({1 - {1/\alpha}})}}} \cdot {\Pr\left( {{\mathcal{M}\left( \mathcal{D}^{\prime} \right)} = r} \right)}^{1/\alpha}}}} & (8)\end{matrix}$

In the step from Equation (6) to (7), probability preservationproperties are used to prove that RDP guarantees can be converted to (ϵ,δ) guarantees. In the context of this proof, it is implied that ϵ-DPholds when e^(−ϵ) RDP Pr(

(

′)=r)>δ^(α)/(^(α-1)), since otherwise Pr(M(

)=r)<δ. It can therefore be assumed that e^(−ϵ) RDP

(

′)=r)>δ^(α/(α-1)), which occurs at most in 1−δ cases, and continue fromEquation (8):

$\begin{matrix}\begin{matrix}{\leq \frac{1}{1 + {e^{- {\epsilon_{RDP}{({1 - {1/\alpha}})}}} \cdot \left( {\delta^{\alpha/{({\alpha - 1})}} \cdot e^{- \epsilon_{RDP}}} \right)^{1/\alpha}}}} \\{= \frac{1}{1 + {e^{- \epsilon_{RDP}} \cdot \delta^{1/{({\alpha - 1})}}}}} \\{= \frac{1}{1 + {e^{- \epsilon_{RDP}} \cdot e^{{{- 1}/{({\alpha - 1})}}{\ln{({1/\delta})}}}}}} \\{= \frac{1}{1 + e^{- {({\epsilon_{RDP} + {{({\alpha - 1})}^{- 1}{\ln{({1/\delta})}}}})}}}}\end{matrix} & (9) \\{= \frac{1}{1 + e^{- \epsilon}}} & (10)\end{matrix}$

In the step from Equation (9) to (10), it is noted the exponentperfectly matches the conversion from ϵ to ϵ_(RCP).

Consequently, Corollary 1 holds with probability 1−δ for

_(Gαu). Hence, the general confidence upper bound for (ε,δ)-differentially private mechanisms can be defined as follows:

Definition 5 (Expected Adversarial Confidence Bound)

For any (ϵ, δ)-differentially private mechanism, the expected bound onthe strong probabilistic adversary's confidence on either dataset

,

′ is

ρ_(c)(ϵ,δ)=

[ρ(ϵ)]=(1−δ)ρ(ϵ)+δ=ρ(ϵ)+δ(1−ρ(δ)).

Adaptive Posterior Belief Adversary.

_(prob) computes posterior beliefs β(⋅) for datasets

and

′ and makes a guess arg max

β(

). Therefore, the strong

_(prob) represents a naive Bayes classifier choosing an option w.r.t. tothe highest posterior probability. The input features are the results robserved by

_(prob), which are independently sampled and thus fulfill the i.i.d.assumption. Also, the noise distributions are known to

_(prob), thus making the naive Bayes classifier the strongestprobabilistic adversary in our scenario.

A universal adversary against DP observes multiple subsequent functionresults and adapts once a new result r is obtained. To extend

_(prob) to an adaptive adversary

_(adapt), adaptive beliefs can be defined as provided below.

Definition 6 (Adaptive Posterior Belief)

Let

,

′ be neighboring datasets

₁,

₂ be ϵ₁, ϵ₂-differentially private mechanisms. If

₁(

) is executed first with posterior belief β₁(

), the adaptive belief for

after executing

₂(

) is:

${\beta_{2}\left( {\mathcal{D},{\beta_{1}(\mathcal{D})}} \right)} = \frac{{\Pr\left( {{\mathcal{M}_{2}(\mathcal{D})} = r} \right)} \cdot {\beta_{1}(\mathcal{D})}}{{{\Pr\left( {{\mathcal{M}_{2}(\mathcal{D})} = r} \right)} \cdot {\beta_{1}(\mathcal{D})}} + {{\Pr\left( {{\mathcal{M}_{2}\left( \mathcal{D}^{\prime} \right)} = r} \right)}\left( {1 - {\beta_{1}(\mathcal{D})}} \right)}}$

Given k iterative independent function evaluations, β_(k)(

) is written to mean β_(k)(

, β_(k−1)(

, . . . )). To compute β_(k)(

), the adaptive adversary

_(adapt) computes adaptive posterior beliefs as specified by Algorithm1.

Algorithm 1 Strong Adaptive Adversary Input: datasets

,  

′, mechanism outputs R = r₁, . . . , r_(k), mechanisms

 ₁, . . . ,  

 _(k) Output: β_(k)( 

 ), β_(k)( 

′) 1: β₀( 

), β₀( 

′) ← 0.5 2: for i ∈ {1, . . . , k} do 3:  p 

 ← Pr( 

_(i)( 

 ) = r_(i)) 4:  p 

_(′) ← Pr( 

_(i)( 

′) = r_(i)) 5:  β_(i)( 

 ) ← β_(i−1)( 

 ) · p 

/(p 

  · β_(i−1)( 

 ) + p 

 _(′) · β_(i−1)( 

′)) 6:  β_(i)( 

′) ← β_(i−1)( 

′) · p 

/(p 

  · β_(i−1)( 

 ) + p 

 _(′) · β_(i−1)( 

′)) 7: end for

The calculation of β_(k) (

) and β_(k) (

′) as presented in Algorithm 1 can also be expressed as a closed formcalculation which can be used later to further analyze the attacker.

$\begin{matrix}{{\beta_{k}(\mathcal{D})} = \frac{\prod_{i = 1}^{k}{\Pr\left( {{\mathcal{M}_{i}(\mathcal{D})} = r_{i}} \right)}}{{\prod_{i = 1}^{k}{\Pr\left( {{\mathcal{M}_{i}(\mathcal{D})} = r_{i}} \right)}} + {\prod_{i = 1}^{k}{\Pr\left( {{\mathcal{M}_{i}\left( \mathcal{D}^{\prime} \right)} = r_{i}} \right)}}}} \\{= \frac{1}{1 + \frac{\prod_{i = 1}^{k}{\Pr\left( {{\mathcal{M}_{i}\left( \mathcal{D}^{\prime} \right)} = r_{i}} \right)}}{\prod_{i = 1}^{k}{\Pr\left( {{\mathcal{M}_{i}(\mathcal{D})} = r_{i}} \right)}}}}\end{matrix}$

Aspects of the associated proof are provided below in which it isassumed that the attacker starts with uniform priors. Thus, β₁ (

) is calculated to be:

$\begin{matrix}{{\beta_{1}(\mathcal{D})} = \frac{\Pr\left( {{\mathcal{M}_{1}(\mathcal{D})} = r_{1}} \right)}{{\Pr\left( {{\mathcal{M}_{1}(\mathcal{D})} = r_{1}} \right)} + {\Pr\left( {{\mathcal{M}_{1}\left( \mathcal{D}^{\prime} \right)} = r_{1}} \right)}}} \\{= \frac{1}{1 + \frac{\Pr\left( {{\mathcal{M}_{1}\left( \mathcal{D}^{\prime} \right)} = r_{1}} \right)}{\Pr\left( {{\mathcal{M}_{1}(\mathcal{D})} = r_{1}} \right)}}}\end{matrix}$

In the second step β₁ (

) is used as the prior, hence β₂ (

) is calculated as:

$\begin{matrix}{{\beta_{2}(\mathcal{D})} = \frac{{\Pr\left( {{\mathcal{M}_{2}(\mathcal{D})} = r_{2}} \right)} \cdot {\beta_{1}(\mathcal{D})}}{\begin{matrix}{{{\Pr\left( {{\mathcal{M}_{2}(\mathcal{D})} = r_{2}} \right)} \cdot {\beta_{1}(\mathcal{D})}} +} \\{{\Pr\left( {{\mathcal{M}_{2}\left( \mathcal{D}^{\prime} \right)} = r_{2}} \right)} \cdot \left( {1 - {\beta_{1}(\mathcal{D})}} \right)}\end{matrix}}} \\{= \frac{1}{1 + \frac{\begin{matrix}{{\Pr\left( {{\mathcal{M}_{2}\left( \mathcal{D}^{\prime} \right)} = r_{2}} \right)} -} \\{{\Pr\left( {{\mathcal{M}_{2}\left( \mathcal{D}^{\prime} \right)} = r_{2}} \right)} \cdot {\beta_{1}(\mathcal{D})}}\end{matrix}}{{\Pr\left( {{\mathcal{M}_{2}(\mathcal{D})} = r_{2}} \right)} \cdot {\beta_{1}(\mathcal{D})}}}} \\{= \frac{1}{1 + \frac{{\Pr\left( {{\mathcal{M}_{2}\left( \mathcal{D}^{\prime} \right)} = r_{2}} \right)} - \frac{\begin{matrix}{{\Pr\left( {{\mathcal{M}_{2}\left( \mathcal{D}^{\prime} \right)} = r_{2}} \right)} \cdot} \\{\Pr\left( {{\mathcal{M}_{1}(\mathcal{D})} = r_{1}} \right)}\end{matrix}}{\begin{matrix}{{\Pr\left( {{\mathcal{M}_{1}(\mathcal{D})} = r_{1}} \right)} +} \\{\Pr\left( {{\mathcal{M}_{1}\left( \mathcal{D}^{\prime} \right)} = r_{1}} \right)}\end{matrix}}}{\frac{{\Pr\left( {{\mathcal{M}_{2}(\mathcal{D})} = r_{2}} \right)} \cdot {\Pr\left( {{\mathcal{M}_{1}(\mathcal{D})} = r_{1}} \right)}}{{\Pr\left( {{\mathcal{M}_{1}(\mathcal{D})} = r_{1}} \right)} + {\Pr\left( {{\mathcal{M}_{1}\left( \mathcal{D}^{\prime} \right)} = r_{1}} \right)}}}}} \\{= \frac{1}{1 + \frac{{\Pr\left( {{\mathcal{M}_{2}\left( \mathcal{D}^{\prime} \right)} = r_{2}} \right)} \cdot {\Pr\left( {{\mathcal{M}_{1}\left( \mathcal{D}^{\prime} \right)} = r_{1}} \right)}}{{\Pr\left( {{\mathcal{M}_{2}(\mathcal{D})} = r_{2}} \right)} \cdot {\Pr\left( {{\mathcal{M}_{1}(\mathcal{D})} = r_{1}} \right)}}}}\end{matrix}$

This scheme continues for all k iterations by induction.

Even though the closed form provides an efficient calculation scheme forβ_(k) (

), numerical issues can be experienced so Algorithm 1 can be used forpractical simulation of

_(adapt). However, by applying the closed form, it can be shown that

_(adapt) operates as assumed by the sequential composition theorem (cf.3) which substantiates the strength of

_(adapt). It is also noted that β₁ (

) has the same form as β_(k) (

), since the multiplication of two Gaussian distributions results inanother Gaussian distribution. Therefore, the composition of severalGaussian mechanisms can be regarded as a single execution of amultidimensional mechanism with an adjusted privacy guarantee.

Theorem 5 (Composition of Adaptive Beliefs).

Let

,

′ be neighboring datasets and

₁, . . . ,

_(k) be an arbitrary sequence of mechanisms providing ϵ₁, . . . ,ϵ_(k)-Differential Privacy, then

$\begin{matrix}{{\beta_{k}(\mathcal{D})} \leq {\rho\left( {\sum\limits_{i = 1}^{j}\epsilon_{i}} \right)}} & (11)\end{matrix}$

By using Definition 1 and δ=0, the following can be bound:

$\begin{matrix}{{\beta_{k}(\mathcal{D})} = {\frac{1}{1 + \frac{\prod_{i = 1}^{k}{\Pr\left( {{\mathcal{M}_{i}\left( \mathcal{D}^{\prime} \right)} = r_{i}} \right)}}{\prod_{i = 1}^{k}{\Pr\left( {{\mathcal{M}_{i}(\mathcal{D})} = r_{i}} \right)}}} \leq}} \\{\frac{1}{1 + \frac{\prod_{i = 1}^{k}{\Pr\left( {{\mathcal{M}_{i}\left( \mathcal{D}^{\prime} \right)} = r_{i}} \right)}}{\prod_{i = 1}^{k}{{\Pr\left( {{\mathcal{M}_{i}\left( \mathcal{D}^{\prime} \right)} = r_{i}} \right)}*e^{\epsilon}i}}}} \\{= \frac{1}{1 + {\prod_{i = 1}^{k}{e^{\epsilon}i}}}} \\{= \frac{1}{1 + e^{- {\sum_{i = 1}^{k}e_{i}}}}} \\{= {{\rho\left( {\sum\limits_{i = 1}^{k}\epsilon_{i}} \right)}\square}}\end{matrix}$

This demonstrates that in the worst case

_(adapt) takes full advantage of the composition of ϵ. But what aboutthe case where δ>0? The same a, can be had in all dimensions as if it isassumed that the privacy budget (ϵ, δ) is split equally s. t.ϵ_(i)=ϵ_(j) and δ₁=δ_(j) which, given previous assumptions, leads toσ_(i)=σ_(j), jϵ{1, . . . , k}. The following can be transformed:

$\begin{matrix}{{\beta_{k}(\mathcal{D})} = {\frac{1}{1 + \frac{\prod_{i = 1}^{k}{\Pr\left( {{\mathcal{M}_{i}\left( \mathcal{D}^{\prime} \right)} = r_{i}} \right)}}{\prod_{i = 1}^{k}{\Pr\left( {{\mathcal{M}_{i}(\mathcal{D})} = r_{i}} \right)}}} \leq}} & (12) \\{\frac{1}{1 + {\prod_{i = 1}^{k}e^{- \epsilon_{i}}}} = {\frac{1}{1 + e^{- {\sum_{i = 1}^{k}e_{i}}}} = {\rho\left( {\sum\limits_{i = 1}^{k}\epsilon_{i}} \right)}}} & (13)\end{matrix}$

In the step from Equation (12) to (13), simplifications from Equations(6) to (10) in Theorem 4 are used. This short proof demonstrates that

_(adapt) behaves as expected by sequential composition theorems also forthe (ϵ, δ)-differentially private Gaussian mechanism.

To take advantage of RDP composition, simplifications from Equation (6)to (9) can be used. The following transformations can be utilized:

$\begin{matrix}\begin{matrix}{{\beta_{k}(\mathcal{D})} = \frac{1}{1 + \frac{\prod_{i = 1}^{k}{\Pr\left( {{\mathcal{M}_{i}\left( \mathcal{D}^{\prime} \right)} = r_{i}} \right)}}{\prod_{i = 1}^{k}{\Pr\left( {{\mathcal{M}_{i}(\mathcal{D})} = r_{i}} \right)}}}} \\{= {\frac{1}{1 + {\prod_{i = 1}^{k}\frac{\Pr\left( {{\mathcal{M}_{i}\left( \mathcal{D}^{\prime} \right)} = r_{i}} \right)}{\Pr\left( {{\mathcal{M}_{i}(\mathcal{D})} = r_{i}} \right)}}} \leq}}\end{matrix} & (14) \\\begin{matrix}{{~~~~~~~~~~~~~~~~~~}\frac{1}{1 + {\prod_{i = 1}^{k}e^{- {({e_{{RDP},i} + {{({\alpha + 1})}^{- 1}{\ln{({1/\delta})}}}})}}}}} \\{= \frac{1}{1 + e^{{{k{({\alpha - 1})}}^{- 1}{\ln{({1/\delta})}}} - {\sum_{i = 1}^{k}\epsilon_{{RDP},i}}}}} \\{= \frac{1}{1 + e^{{{({\alpha - 1})}^{- 1}{\ln{({1/\delta})}}} - {\sum_{i = 1}^{k}\epsilon_{{RDP},i}}}}}\end{matrix} & (15) \\{\mspace{50mu}{= {\rho\left( {{\sum\limits_{i = 1}^{k}\epsilon_{{RDP},i}} - {\left( {\alpha - 1} \right)^{- 1}{\ln\left( {1/\delta^{k}} \right)}}} \right)}}} & (16)\end{matrix}$

Equation (16) implies that an RDP-composed bound can be achieved with acomposed δ value of δ^(k). It is known that sequential compositionresults in a composed δ value of kδ. Since δ^(k)<kδ, RDP offers astronger (ϵ, δ) guarantee for the same ρ_(c). This behavior can also beinterpreted as the fact that holding the composed (ϵ, δ) guaranteeconstant, the value of ρ_(c) is greater when sequential composition isused compared to RDP. Therefore, RDP offers a tighter bound for ρ_(c)under composition.

Expected Membership Advantage.

The adaptive posterior belief adversary allows to transform the DPguarantee (ϵ, δ) into a scalar measure ρ_(c) indicating whether

_(adapt) can confidently re-identify an individual's record in adataset. From an individual's point of view, of interest is deniability,i.e., if

_(adapt) has low confidence, an individual can plausibly deny that thehypothesis of

_(adapt) is correct. A resulting question concerns how often a guess by

_(adapt) about the presence of an individual is actually correct or what

_(adapt)'s advantage is. As described above, it can be assumed that

_(adapt) operates as a naive Bayes classifier with known probabilitydistributions. Looking at the decision boundary of the classifier (i.e.,when to choose

or

_(Gαu) with different (ϵ, δ) guarantees, it is found that the decisionboundary does not change as long as the PDFs are symmetric. For example,consider a scenario with given datasets

,

′ and query ƒ: DOM→

that yields ƒ (

)=0 and ƒ (

′)=1. Furthermore, assuming w.l.o.g. that Δƒ₂=1.

FIGS. 2(a) and 2(b) are diagrams 200 a, 200 b, respectively illustratingdecision boundaries based on PDFs and confidence. It is shown that thedecision boundary of

_(adapt) does not change when increasing the privacy guarantee since (ϵ,δ) causes the PDFs of

and

′ to become squeezed. Thus,

_(adapt) will exclusively choose

if a value is sampled from the left, red region, and vice versa for

′ in the right blue region. Still, confidence towards either decisiondeclines.

If a (6, 10⁻⁶)-

P

_(Gaαu) is applied to perturb the results off ƒ,

_(adapt) has to choose between the two PDFs with solid lines in FIG. 2abased on the output M_(Gαu)(⋅)=r. FIG. 2b visualizes the resultingposterior beliefs for

,

D′ (solid lines), the highest of which

_(adapt) chooses given r. The regions where

_(adapt) chooses

are shaded red in both figures, and regions that result in the choice

′ are shaded blue. Increasing the privacy guarantee to (3, 10⁻⁶)-DP(dashed lines in the figures) squeezes the PDFs and confidence curves.However, the decision boundary of the regions at which

_(adapt) chooses a certain dataset stay the same. Thus, it is importantto note that holding r constant and reducing (ϵ, δ) solely affects theposterior beliefs of

_(adapt), not the choice (i.e., the order from most to least confidentis maintained even while maximum posterior belief is lowered).

However, it is expected that the information “How likely is an adversaryto guess the dataset in which I have participated?” to be a major pointof interest when interpreting DP guarantees in iterative evaluations ofƒ, like those found in data science use cases such as machine learning.Expected membership advantage ρ_(α) can be defined as the differencebetween the probabilities of

_(adapt) correctly identifying

(true positive rate) and of

_(adapt) misclassifying a member of

′ as belonging to

(false negative rate), as in 40. The worst-case advantage ρ_(α)=1 occursin the case in which

always samples on that side of the decision boundary that belongs to thetrue dataset

. In contrast to the analysis of ρ_(c), ρ_(α) will not give a worst-casebound, but an average case estimation. Since

_(adapt) is a naive Bayes classifier, the properties of normaldistributions can be used. With the multidimensional region where

_(adapt) chooses

as ∫

_(c) and where

_(adapt) chooses

′ as ∫

_(i):

ρ_(α) =Pr(Succes:=Pr(

_(adapt)=

|

))−Pr(Error:=Pr(

_(adapt)=

|

))=∫

_(c) Pr(

(

)=r)Pr(

)dr−∫ _(Di) Pr(

(

)=r)Pr(

)dr.

The corresponding regions of error for the previous example arevisualized in diagrams 300 a, 300 b of FIGS. 3(a) and 3(b). If

_(Gau) is applied to achieve (ϵ, δ)-DP, the exact membership advantageof

_(adapt) can be determined analytically. Two multidimensional GaussianPDFs (i.e.,

_(Gau) (

),

_(Gau) (

′)) with known covariance matrix Σ and known means μ₁=ƒ(

), μ₂=ƒ(

′) can be considered.

$\begin{matrix}\begin{matrix}{{\Pr({Success})} = {1 - {\Pr({Error})}}} \\{{= {1 - {\Phi\left( {{- \Delta}/2} \right)}}},}\end{matrix} & (17)\end{matrix}$

where ϕ is the cumulative distribution function (CDF) of the standardnormal distribution and Δ=√{square root over ((μ₁−μ₂)^(T)Σ⁻¹(μ₁−μ₂))}the Mahalanobis distance. Adding independent noise in all dimensionsΣ=σ²

, the Mahalanobis distance simplifies to

$\Delta = {\frac{{{\mu_{1} - \mu_{2}}}_{2}}{\sigma}.}$

Definition 7 (Bound on the Expected Adversarial Membership Advantage)

For the (ϵ, δ)-differentially private Gaussian mechanism, the expectedmembership advantage of the strong probabilistic adversary on eitherdataset

is

$\begin{matrix}{{{\rho\;}_{a}\left( {\epsilon,\delta} \right)} = {{\Phi\left( {\Delta/2} \right)} - {\Phi\left( {{- \Delta}/2} \right)}}} \\{= {{\Phi\left( \frac{{{\mu_{1} - \mu_{2}}}_{2}}{2\sigma_{i}} \right)} - {\Phi\left( {- \frac{{{\mu_{1} - \mu_{2}}}_{2}}{2\sigma_{i}}} \right)}}} \\{= {{\Phi\left( \frac{{{\mu_{1} - \mu_{2}}}_{2}}{2\Delta\;{f_{2}\left( {\sqrt{2\;{\ln\left( {1.25/\delta} \right)}}/\epsilon} \right)}} \right)} -}} \\{{\Phi\left( {- \frac{{{\mu_{1} - \mu_{2}}}_{2}}{2\Delta\;{f_{2}\left( {\sqrt{2\;{\ln\left( {1.25/\delta} \right)}}/\epsilon} \right)}}} \right)} \leq} \\{{\Phi\left( \frac{1}{2\left( {\sqrt{2\;{\ln\left( {1.25/\delta} \right)}}/\epsilon} \right)} \right)} -} \\{\Phi\left( {- \frac{1}{2\left( {\sqrt{2\;{\ln\left( {1.25/\delta} \right)}}/\epsilon} \right)}} \right)}\end{matrix}$

Again, the current framework can express (ϵ, δ) guarantees with δ>0 viaa scalar value ρ_(α). However, a specific membership advantage can becomputed individually for different kinds of mechanisms

.

Above it was evaluated how the confidence of

_(adapt) changes under composition. A similar analysis of the membershipadvantage under composition is required. Again, the elucidations can berestricted to the Gaussian mechanism. As shown above, the k-foldcomposition of

_(Gαui), each step guaranteeing (α,ϵ_(RDP),i)-RDP, can be represented bya single execution of

_(Gαu) with k-dimensional output guaranteeing (α,ϵ_(RDP)=kϵ_(RDP),i)-RDP. For this proof, it can be assumed that each ofthe composed mechanism executions has the same sensitivity∥μ_(1,i)−μ_(2,i)∥=Δ∫₂. A single execution of

_(Gαu) can be analyzed with the tools described above. Definition 7yields

$\begin{matrix}{{\rho\;}_{a} = {{\Phi\left( {\Delta/2} \right)} - {\Phi\left( {{- \Delta}/2} \right)}}} \\{= {{\Phi\left( \frac{{{\mu_{1} - \mu_{2}}}_{2}}{2\sigma_{i}} \right)} - {\Phi\left( {- \frac{{{\mu_{1} - \mu_{2}}}_{2}}{2\sigma_{i}}} \right)}}} \\{= {{\Phi\left( \frac{\sqrt{k}{{\mu_{1,i} - \mu_{2,i}}}_{2}}{2\Delta\; f_{2}\sqrt{\alpha/\left( {2\epsilon_{{RDP},i}} \right)}} \right)} - \left( {- \frac{\sqrt{k}{{\mu_{1,i} - \mu_{2,i}}}_{2}}{2\Delta\; f_{2}\sqrt{\alpha/\left( {2\epsilon_{{RDP},i}} \right)}}} \right)}} \\{= {{\Phi\left( \frac{\sqrt{k}}{2\sqrt{\alpha/\left( {2\epsilon_{{RDP},i}} \right)}} \right)} - {\Phi\left( {- \frac{\sqrt{k}}{2\sqrt{\alpha/\left( {2\epsilon_{{RDP},i}} \right)}}} \right)}}} \\{= {{\Phi\left( \sqrt{\frac{k\;\epsilon_{{RDP},i}}{2\alpha}} \right)} - {\Phi\left( {- \sqrt{\frac{k\;\epsilon_{{RDP},i}}{2\alpha}}} \right)}}} \\{= {{\Phi\left( \sqrt{\frac{\;\epsilon_{RDP}}{2\alpha}} \right)} - {\Phi\left( {- \sqrt{\frac{\;\epsilon_{RDP}}{2\alpha}}} \right)}}}\end{matrix}$

The result shows that the strategy of

_(adapt) fully takes advantage of the RDP composition properties ofϵ_(RDP,i) and α. As expected, ρ_(α) takes on the same value, regardlessof whether k composition steps with ϵ_(RDP,i) or a single compositionstep with ϵ_(RDP) is carried out.

Privacy Regimes.

With confidence ρ_(c) and expected membership advantage ρ_(α) twomeasures were defined that taken together form the current framework forinterpreting DP guarantees (ϵ, δ). While ρ_(α) indicates the likelihoodwith which

_(adapt) is to discover any participant's data correctly, ρ_(c)complements this information with the plausibility with which anyparticipant in the data can argue that

_(adapt) guess is incorrect. Here it is demonstrated how the currentframework can be applied to measure the level of protection independentof any particular dataset

. Furthermore, several, allegedly secure (ϵ, δ) pairs suggested inliterature are revisited and interpret their protection. Finally,general guidance is provided to realize high, mid or low to no privacyregimes.

The interpretability framework can be applied in two steps. First, theparticipants in the dataset receive a predefined (ϵ, δ) guarantee. This(ϵ, δ) guarantee is based on the maximum tolerable decrease in utility(e.g., accuracy) of a function ƒ evaluated by a data analyst. Theparticipants interpret the resulting tuple (ρ_(c), ρ_(α)) w.r.t. theirprotection. Each participant can either reject or accept the use oftheir data by the data analyst. Second, participants are free to suggestan (ϵ, δ) based on the corresponding adversarial confidence andmembership advantage (ρ_(c), ρ_(α)) which is in turn evaluated by thedata analyst w.r.t. expected utility of ƒ. To enable participants toperform this matching, general curves of ρ_(c) and ρ_(α) are providedfor different (ϵ, δ) as shown in diagrams 400 a, 400 b of FIGS. 4(a),4(b). For ρ_(α), the curves are specific for

_(Gαu). In contrast, ρ_(c) is independent of

. To compute both measures, Definition 5 and Definition 7 can be used.It can also be assumed w.l.o.g. that ƒ(

)=(0₁, 0₂, . . . , 0_(k)) and ƒ(

′)=(1₁, 1₂, . . . , 1_(k)) for all dimensions k. Thus, ƒ(

) and Δƒ₂=√{square root over (k)}(

′) are maximally distinguishable, resulting in

FIG. 4a illustrates that there is no significant difference between theexpected worst-case confidence of

_(adapt) for ϵ-DP and (ϵ, δ)-

P for 0<δ<0.1. In contrast, ρ_(α) strongly depends on the choice of δ asdepicted in FIG. 4b . For example, ρ_(α) is low for (2, 10⁻⁶)-DPindicating that the probability of

_(adapt) choosing

is similar to choosing

′. Yet, the corresponding ρ_(c) is high, which provides support that

_(adapt) guesses is correct. With these implications in mind, dataowners and data analysts are empowered to discuss about acceptableprivacy guarantees.

Validation Over Synthetic Data.

The following demonstrates how the confidence and membership advantageof

_(adapt) develop in an empirical example. The evaluation characterizeshow well ρ_(c) and ρ_(α) actually model the expected membershipadvantage risk for data members and how effective

_(adapt) behaves on synthetic data. As

_(adapt) is assumed to know all data member except for one, the size of

does not influence her. For this reason, the following tiny datauniverse U, true dataset D and alternative D′ presented in Tables 2, 3and 4 was used. Let U represent a set of employees that were offered toparticipate in a survey about their hourly wage. Alice, Bob and Carolparticipated. Dan did not. Thus, the survey data

consists of 3 entries. The survey owner allows data analysts to posequeries to

until a DP budget of (ϵ=5, δ=0.01) is consumed.

_(adapt) is the data analyst that queries

. Aside from learning statistics about the wage,

_(adapt) is also interested in knowing about who participated. So far,she knows that Alice and Carol participated for sure and there are threepeople in total. Thus, she has to decide between

and

′, i.e., whether Bob or Dan is the missing entry. As a side information,she knows that the employer pays at least $1 and a maximum of $10. As aconsequence, when

_(adapt) is allowed to ask only the sum query function, S(ƒ)=Δƒ₂=9.Further, the Gaussian mechanism is known to be used for anonymization.

TABLE 2

Name Wage Alice $5 Bob $10 Carol $2 Dan $1

TABLE 3

Name Wage Alice $5 Bob $10 Carol $2

TABLE 4

Name Wage Alice $5 Dan $1 Carol $2

Given this prior information,

_(adapt) iteratively updates her belief on

,

′ after each query. She makes a final guess after the whole (ϵ,δ)-budget has been used. By using the current framework (ρ_(c), ρ_(α)),data members (especially Bob in this case) can compute their protectionguarantee: What is the advantage of

_(adapt) in disclosing a person's participation (i.e., ρ_(α))? Howplausibly can that person deny a revelation (i.e., (ρ_(c)) by

_(adapt)? Referring to Definition 7 and Definition 5, ρ_(α)(ϵ=5,δ=0.01)=0.5 is computed under composition and ρ_(c)(ϵ=5, δ=0.01)=0.99illustrating that the risk of re-identification is quite high and thedeniability extremely low. However, to show if

_(adapt) actually reaches those values her behavior can be empiricallyanalyzed by iteratively querying

and applying Algorithm 1 after each query. k=100 queries can be used andthe experiment can be repeated 10,000 times to estimate membershipadvantage and show the distribution of confidence at the end of eachrun. As it is known that the adversary will compose k times, the RDPcomposition scheme can be used to determine what noise scale can beapplied for each individual query. In diagram 500 a of FIG. 5a theadaptive posterior beliefs for

and

′ are depicted. After some fluctuation, the belief for

starts growing up to 0.90. Consequently, the final guess of

_(adapt) is

which is correct. The guesses over all runs are summarized in diagram500 b of FIG. 5(b). Here, it is seen that about 75% of the guesses of

_(adapt) are correct which corresponds exactly to the expectedmembership advantage of our threat model. However, the predicted upperbound of 0.99 was not reached in the sample run. In contrast to ρ_(α),ρ_(c) is a worst case bound, not an expectation. Thus, the final beliefof

_(adapt) approaches ρ_(c) very rarely.

To illustrate this phenomenon, a a histogram over the beliefs at the endof each run for various choices of β_(c) in diagram 600 of FIG. 6. FIG.6 illustrates that the predicted worst case bound was reached in a smallproportion of runs. This effect becomes more visible when ϵ is lower.

A final note on δ which describes the probability of exceeding ρ_(c).When looking closely at the histograms, one can see that there are some(small) columns for a range of values that are larger than theworst-case bound. Their proportion to all runs can be calculated, e.g.,0.0008 for ϵ=5 which is less than the expected 6.

Application to Deep Learning.

A natural question arising is how the introduced adversary

_(adapt) behaves on real data and high dimensional, iterativedifferentially private function evaluations. Such characteristics aretypically found in Deep Learning classification tasks. Here, a neuralnetwork (NN) is provided a training dataset

to learn a prediction function ŷ=ƒ_(nn)(x) given (x, y) ϵ

. Learning is achieved by means of an optimizer. Afterwards, theaccuracy of the learned prediction function ƒ_(nn)(⋅) is tested on adataset

^(test).

A variety of differentially private optimizers for deep learning can beutilized. These optimizers represent a differentially private trainingmechanism

_(nn)(ƒ_(θ)(⋅)) that updates the weights θ_(t) per training step tϵTwith θ_(t)←θ_(t−1)−α·{tilde over (g)}, where α>0 is the learning rateand {tilde over (g)} denotes the Gaussian perturbed gradient (cf.Definition 2). After T update steps, where each update step itself anapplication of

_(Gau)(ƒ_(θ)(⋅)), the algorithm outputs a differentially private weightmatrix θ which is then used in the prediction function θ_(nn)(⋅).Considering the evaluation of ƒ_(nn)(⋅) given (x, y)ϵ

as post-processing of the trained weights θ, it is found that predictionŷ=ƒ_(nn)(x) is (ϵ, δ)-differentially private too.

It is assumed that

_(adapt) desires to correctly identify the dataset that was utilized fortraining when having the choice between

and

′. There are two variations of DP: bounded and unbounded. In bounded DP,it holds that |

|=|

′|. However, differentially private deep learning optimizers such as theone utilized herein consider unbounded DP as the standard case in which|

|−|

′|=1. Furthermore, it can be assumed that

_(adapt) to possess the following information: the initial weights θ₀,the perturbed gradients {tilde over (g)} after every epoch, the valuesof privacy parameters (ϵ, δ), and sensitivity Δƒ₂=

equal to the clipping norm. Here, Δƒ₂ refers to the sensitivity withwhich noise added by a mechanism is scaled, not necessarily globalsensitivity. In some experiments, for example, Δƒ₂=S(ƒ), which expressesthe true difference between ƒ(

) and ƒ(

′), as in Definition 3. The assumptions are analogous to those ofwhite-box membership inference attacks. The attack itself is based oncalculating clipped gradients ĝ(

, θ_(t)) and ĝ (

′, θ_(t)) for each training step tϵT, and finding β(

) for that training step given and calculating a θ_(t+1) by applying ĝ.

Above, sensitivity was set to Δƒ₂=S(ƒ)=∥ƒ(

)−ƒ(

′)∥₂, the true difference between the sums of wages resulting from Bob'sand Dan's participation in a survey. In order to create a comparableexperiment for differentially private deep learning, the differencebetween gradients that can be obscured by noise for each epoch isS(ƒ_(θ) _(t) )=∥n·ĝ(

, θ_(t))−(n−1)·ĝ (

′, θ_(t)))∥₂.

bounds the influence of a single training example on training byclipping each per-example gradient to the chosen value of

; although this value bounds the influence of a single example on thegradient, this bound is loose. If S(ƒ_(θ) _(t) <<

, adversary confidence β(

) would be very small in every case when Δƒ₂=

, as is the case in most implementations of differentially privateneural networks. This behavior is due to the fact that an assumption forEquation (4) does not hold, since Δβ₂≠

S(ƒ_(θ) _(t) ). To address this challenge in differentially private deeplearning, Δƒ₂=S(ƒ_(θ) _(t) ) can be adaptively set. Choosing Δθ₂ thisway is analogous to using local sensitivity in differential privacy.

Algorithm 2 Strong Adaptive Adversary in Deep Learning Require: Datasets

 and

′ with n and n − 1 records

_(i) and

_(i)′, respectively, training steps T, cost function J(θ), perturbedgradients g _(t) for each training step t ≤ T, initial weights θ₀, priorbeliefs β₀(

) = β₀(

′) = 0.5, learning rate α, clipping threshold C, and mechanism  

Ensure: Adversary Confidence β_(T)(

)  1: for t ∈ [T] do Compute gradients  2: For each i ∈

,

′, compute g_(t)(

_(i)) ← ∇_(θ) _(t) J(θ_(t),

_(i)) and g_(t)(

_(i)′) ← ∇_(θ) _(t) J(θ_(t),

_(i)′)  3: Clip gradients  4: Clip each g_(t)(

_(i)), g_(t)(

_(i)′) for i ∈

,

′ to have a maximum L² norm C using$\left. {{\overset{\_}{g}}_{t}\left( \mathcal{D}_{i} \right)}\leftarrow{{g_{t}\left( \mathcal{D}_{i} \right)}\text{/}{\max\left( {1,\frac{{{g_{t}\left( \mathcal{D}_{i} \right)}}_{2}}{c}} \right)}\mspace{14mu}{and}\mspace{14mu}{{\overset{\_}{g}}_{t}\left( \mathcal{D}_{i}^{\prime} \right)}}\leftarrow \right.$${g_{t}\left( \mathcal{D}_{i}^{\prime} \right)}\text{/}{\max\left( {1,\frac{{{g_{t}\left( \mathcal{D}_{i}^{\prime} \right)}}_{2}}{c}} \right)}$ 5: Calculate Batch gradients  6: ĝ_(t)(

) ← avg(g _(i)(

_(i)))  7: ĝ_(t)(

′) ← avg(g _(i)(

_(i)′))  8: Calculate Sensitivity  9: Δf_(t) ← n · ∥n · ĝ_(t)(

′) − (n − 1) · ĝ_(t)(

)∥₂ 10: Calculate Belief 11:$\left. {\beta_{t + 1}(\mathcal{D})}\leftarrow\frac{{\beta_{t}(\mathcal{D})}*{\Pr\left\lbrack {{\mathcal{M}\left( {{\hat{g}}_{t}(\mathcal{D})} \right)} = {\overset{\sim}{g}}_{t}} \right\rbrack}}{{{\beta_{t}(\mathcal{D})}*{\Pr\left\lbrack {{\mathcal{M}\left( {{\hat{g}}_{t}(\mathcal{D})} \right)} = {\overset{\sim}{g}}_{t}} \right\rbrack}} + {{\beta_{t}\left( \mathcal{D}^{\prime} \right)}*{\Pr\left\lbrack {{\mathcal{M}\left( {{\hat{g}}_{t}\left( \mathcal{D}^{\prime} \right)} \right)} = {\overset{\sim}{g}}_{t}} \right\rbrack}}} \right.$Compute weights 12: θ_(t+1) ← θ_(t) − α{tilde over (g)}_(t) 13: end for

Based on the previously introduced assumptions and notations Algorithm 1can be adapted to be bounded and unbounded, as well as global andS(ƒ_(θ))-based, settings. The adapted Strong Adaptive Adversary fordifferentially private deep learning is stated in Algorithm 2 specifies

_(adapt) in an unbounded environment with Δθ₂=S(ƒ_(θ)). For boundeddifferential privacy with Δƒ₂=S(ƒ_(θ)), Algorithm 2 can be adjusted,s.t.

′ is defined to contain n records, and Δƒ=S(ƒ_(θ) _(t) )=n·∥ĝ_(t) (

′)−ĝ_(t)(

)∥₂. To implement global unbounded differential privacy, Δƒ₂=

and

′ contains n−1 records. To implement global bounded differentialprivacy,

′ contains n−1 records and Δθ₂=2

, since the maximum influence of one example on the sum of per-examplegradients is

. If one record is replaced with another, the lengths of the clippedgradients of these two records could each be

and point in opposite directions, which results in n·∥ĝ_(t)(

′)−n·∥ĝ_(t)(

)∥₂=2

. It is also noted that the same value of Δƒ₂ used by

_(adapt) can also be used by

to add noise.

For practical evaluation a feed-forward NN for the MNIST dataset wasbuilt. For MNIST the utilized NN architecture consists of tworepetitions of a convolutional layer with kernel size (3, 3), batchnormalization and max pooling with pool size (2, 2) before beingflattened for the output layer. Relu and softmax activation functionswere used for the convolutional layers and the output layer,respectively.

One epoch represents the evaluation of all records in

. Thus, it is important to highlight that the number of update steps Tvaries in practice depending on the number of records from

used for calculating the DP gradient update ĝ. In mini-batch gradientdescent a number of b records from

is used for calculating an update and one epoch results in t=

/b update steps. In contrast in batch gradient descent, all records in

are used for calculating the update and each epoch consists of a singleupdate step. While the approaches vary in their speed of convergence dueto the gradient update behavior (i.e., many small updates vs. few largeupdates) none of the approaches has hard limitations w.r.t. convergenceof accuracy and loss. With the current subject matter, batch gradientdescent was utilized and given differentially private gradient updates{tilde over (g)} after any update step t the previously introducedadversary

_(adapt) shall decide whether it was calculated on

or

′. It was assumed that

_(adapt) has equal prior beliefs of 0.5 on

and

′. The prior belief of

_(adapt) adapts at every step t according to (1).

In the experiments, relevant parameters were set as follows: trainingdata |D|=100, epochs k=30, clipping norm

=3.0, learning rate α=0.005, 5=0.01, and ρ_(c)=0.9. These valuescorrespond to ρ_(α)=25.62%.

TABLE 5 Empirical (p_(a), δ) Δf₂ = S(f_(θ)) Global Δf₂ Bounded DP(0.240, 0.002) (0.108, 0)    Unbounded DP (0.250, 0.001) (0.266, 0.001)

The empirically calculated values (i.e., after the training) for ρ_(α)and δ are presented in Table 5. The belief distributions for thedescribed experiments can be found in diagrams 700, 800 of FIGS. 7-8.

Note that δ indeed bounds the percentage of experiments for which β_(T)(

)>ρ_(c). For all experiments with Δƒ₂=S(ƒ_(θ)) and for global, unboundedDP, the empirical and analytical values of ρ_(α) match the empiricalvalues. However, in global, bounded differential privacy the differenceof correct guesses and incorrect guesses by

_(adapt) falls below ρ_(α). In this experiment, the percentage ofexperiments for which β_(T)(

)>ρ_(c) is also far lower. This behavior confirms the hypothesis that

is loose, so global sensitivity results in a lower value of β_(T)(

), as is again confirmed by FIGS. 7b and 9a . It is also noted that thedistributions in FIGS. 7a and 7c look identical to each other and to thedistributions in FIG. 6 for the respective values of ρ_(c) and δ. Thisobservation confirms that the strong adaptive adversary attack model isapplicable to choose privacy parameter e in deep learning.

The following investigates the reason for the similarities betweenunbounded differential privacy with Δƒ₂=S(ƒ_(θ)) and Δƒ₂=

and also for the differences between FIGS. 7(a) and 7(b) concerningbounded differential privacy with Δƒ₂=S(ƒ_(θ)) and Δƒ₂=2

. In the unbounded case, the distributions seem identical in diagram 800of FIG. 8, which occurs when Δƒ₂=S(ƒ_(θ))=∥(n−1)·ĝ_(t) (

′)−n·ĝ_(t)(

)∥₂=

, so the clipped per example gradient of the differentiating example in

should have the length 3, which is equal to

. This hypothesis is confirmed with a glance at the development of∥(n−1)·ĝ_(t)(

′)−n·ĝ_(t)(

)∥₂ in diagram 900 a of FIG. 9. This behavior is not surprising, sinceall per example gradients over the course of all epochs were greaterthan or close to

=3. In the bounded differential privacy experiments,Δƒ₂=S(ƒ_(θ))=n·∥ĝ_(t)(

)−ĝ_(t) (

)∥₂≠2

, since the corresponding distributions in FIGS. 7(a) and 7(b), as wellas FIG. 8, do not look identical. This expectation is confirmed by theplot of n·∥ĝ_t(

′)−ĝ_(t)(

)∥₂ in FIG. 9(a). This difference implies that the per example gradientsof the differentiating examples in

′ and

are less than 2

and do not point in opposite directions. It is also noted that thelength of gradients tends to decrease over the course of training, atrend that can be observed in diagram 900 a of FIG. 9(a), so if trainingconverges to a point in which gradients are shorter than the chosenvalue of

, globally differentially private deep learning inherently offers astronger privacy guarantee than was originally chosen.

Diagram 900(b) of FIG. 9(b) confirms that the differentially trainedmodels in these models do, indeed, yield some utility. It was alsoobserved that test accuracy is directly affected by the value ofsensitivity Δƒ₂ chosen for noise addition. Since gradients in all fourscenarios are clipped to the same value of <

, the only differences between training the neural networks is Δƒ₂. Asvisualized in FIG. 9(a), sensitivities for unbounded DP withΔƒ₂=S(ƒ_(θ)) and Δƒ₂=

were identical, so the nearly identical corresponding distributions inFIG. 9(b) do not come as a surprise.

Similarly, it is observed that Δƒ₂ is greater for global, bounded DP inFIG. 9(a), so utility is also lower for this case in FIG. 9(b). Theunbounded DP case with Δƒ₂=S(ƒ_(θ)) yields the highest utility, whichcan be explained by the low value of Δƒ₂ that can be read from FIG.9(b).

Relation to Membership Inference Threat Model.

The membership inference threat model and the analysis of

_(adapt) herein exhibit clear commonalities. Namely, they exhibit thesame overarching goal: to intuitively quantify the privacy offered by DPin a deep learning scenario. Both approaches aim to clarify the privacyrisks associated with deep learning models.

Considering that

_(adapt) desires to identify the dataset used for training a NN,

_(adapt) is analogous to a membership inference adversary who desires toidentify individual records in the training data. Furthermore, parallelto membership inference,

_(adapt) operates in the whitebox model, observing the development of

over all training steps t of a NN. In one approach, the adversary usesthe training loss to infer membership of an example.

Although the general ideas overlap,

_(adapt) is far stronger than a membership inference adversary.Membership advantage quantifies the effectiveness of the practicalmembership inference attack and therefore provides a lower bound oninformation leakage, which adversaries with auxiliary informationquickly surpass.

_(adapt) has access to arbitrary auxiliary information, including alldata points in

and

′, staying closer to the original DP guarantees. Using

_(adapt), what the best possible adversary is able to infer can becalculated and it can be seen that this adversary reaches the upperbound.

An adversarial game can be defined in which the adversary receives bothdatasets

and

′ instead of only receiving one value z, the size n of the dataset, andthe distribution from which the data points are drawn.

Experiment 1.

Let

be an adversary, A be a learning algorithm,

and

′ be neighboring datasets. The identifiability experiment proceeds asfollows:

-   -   1. Choose b←{0,1} uniformly at random    -   2 Let        =        (        ) if b=0 and        =        (        ′) if b=1    -   3. Output 1 if        (        ,        ′,        )=0, 0 otherwise.        outputs 0 or 1.

Here, the expected value of membership advantage to quantify theaccuracy of

_(adapt) is calculated.

In the evaluation of

_(adapt) in a deep learning setting, it was realized that

_(adapt) did not reach the upper confidence bound until the sensitivitywas adjusted. In differentially private deep learning, gradientsdecrease over the course of training until convergence and can fallbelow the sensitivity or clipping norm. This means that more noise isadded than would have been necessary to obscure the difference made by amember of the dataset. Overall, the difference between the lower boundon privacy offered by membership advantage in a membership inferencesetting and the upper bound offered by maximum adversary confidenceincludes the auxiliary knowledge of

_(adapt) and the inherent privacy offered in deep learning scenariosthrough decreasing gradients.

Application to Analytics.

The observed utility gains can also be realized on real-world data in anenergy forecasting task that is relevant for German energy providers. Inthis energy forecasting problem, the energy transmission network isstructured into disjoint virtual balancing groups under theresponsibility of individual energy providers. Each balancing groupconsist of multiple zones and each zone consists of individualhouseholds. Energy providers have an incentive to forecast the demand intheir balancing group with low error to schedule energy productionaccordingly. Currently, energy providers can utilize the overallaggregated energy consumption per balancing group to calculate a demandforecast, since they have to report these number to the transmissionsystem operator. However, with the rollout of smart meters additionalcommunication channels can be set up and the demand per zone could becomputed. Note that, counterintuitively, forecasting on groupedhousehold loads instead of individual households is beneficial forforecasting performance due to reduced stochasticity. Nonetheless,computing the demand per zone is reflecting the sum of individualhousehold energy consumption and thus a sensitive task. Here, the use ofdifferential privacy allows one to compute the anonymized energyconsumption per zone and mitigate privacy concerns. The energy providerwill only have an incentive to apply differential privacy, if theforecasting error based on differentially private energy consumption perzone is lower than the forecasting error based on the aggregated zonalloads. Vice versa the energy provider has to quantify and communicatethe achieved privacy guarantee to the balancing group households togather consent for processing the data.

This forecasting task was based on the dataset and benchmarking model ofthe 2012 Global Energy Forecast Competition (GEFCom). The GEFCom datasetconsists of hourly energy consumption data of 4.5 years from 20 zonesfor training and one week of data from the same zones for testing. TheGEFCom winning model computes the Forecast F for a zone z at time t bycomputing a linear regression over a p dimensional feature vector x(representing 11 weather stations):

$\begin{matrix}{F_{z,t} = {\beta_{0} + {\sum\limits_{j = 1}^{p}{\beta_{j} \cdot x_{t,j}}} + {e_{t}.}}} & (18)\end{matrix}$

The sensitive target attribute of the linear regression is zonal loadconsumption L_(z),t, e.g., the sum of n household loads l_(z),t,i, . . ., n. Differential privacy can be added to the sum computation byapplying the Gaussian mechanism (cf. 2) yielding:

$\begin{matrix}{\left. {L_{z.t^{\prime}} = {{\sum\limits_{i = 1}^{n_{z}}l_{z,i,t^{\prime}}} + {\mathcal{N}\left( {0,\sigma^{2}} \right)}}} \right).} & (19)\end{matrix}$

The energy provider will only have a benefit if the differentiallyprivate load forecasting is having a smaller error than the aggregatedforecast. A suited error metric for energy forecasting is the MeanAbsolute Error (MAE), i.e.:

$\begin{matrix}{{MAE} = {\frac{1}{T}{\sum\limits_{t = 1}^{T}{{{F_{t} - L_{t}}}.}}}} & (20)\end{matrix}$

Diagram 1000 a of FIG. 10(a) illustrates that the forecasting error over10 independent forecast trainings over increasing privacy parameters c.Note that this illustration was limited to ϵ<0.4216 since theforecasting error is already exceeding the aggregate forecast error forthis ϵ and continues to increase afterwards. Thus, from a utilityperspective the energy provider will have a preference for ϵ>>0.4216.The privacy loss is again analyzed over composition (k=38, 070) with RDPfor an additive privacy loss δ=10⁻⁹ and a global sensitivity of Δƒ=48 kWwhich is the maximum technical power demand fused in German residentialhomes. However, in practice households have been observed not to exceedpower consumptions of up to 15.36 kW, which thus is used as an estimatefor S(ƒ).

FIG. 10(b) illustrates the corresponding MAE when applying the Gaussianmechanism with Δƒ=S(ƒ)=15.36. Note that for both Gaussian mechanismsnoise is sampled from a Gaussian distribution with σ=ηz·Δƒ, and that wasused an equal z for both Gaussian mechanisms. FIG. 10 illustrates thatthe achievable utility is consistently higher.

In contrast to the energy provider, it is assumed that households havean interest in ρ_(c)<<1. This leads to the question whether the energyprovider and the households have a intersection of their preferred c.FIG. 10(c) maps ρ_(c) and ρ_(α) to MAE over ϵ for δ=10⁻⁹ and eitherΔƒ=48 or Δƒ=S(ƒ)=15.36. It is observed for ρ_(c)≈0.65, which results inϵ≈0.6 and ρ_(α)≈0.04, that the use of S(ƒ) instead of global sensitivityallows to reduce the maximum MAE by approximately 10 MW, which issignificant when considering that the absolute difference between theaggregated forecast MAE and unperturbed forecast MAE is only ≈12 MW.

FIG. 11 is a process flow diagram 1100 in which, at 1110, data isreceived that specifies a bound for an adversarial posterior beliefρ_(c) that corresponds to a likelihood to re-identify data points fromthe dataset based on a differentially private function output. Privacyparameters ε, δ are then calculated, at 1120, based on the received datathat govern a differential privacy (DP) algorithm to be applied to afunction to be evaluated over a dataset. The calculating is based on aratio of probabilities distributions of different observations, whichare bound by the posterior belief ρ_(c) as applied to a dataset. Thecalculated privacy parameters are then used, at 1130, to apply the DPalgorithm to the function over the dataset to result in an anonymizedfunction output (e.g., a machine learning model, etc.).

FIG. 12 is a process flow diagram 1200 in which, at 1210, data isreceived that specifies privacy parameters ε, δ which govern adifferential privacy (DP) algorithm to be applied to a function to beevaluated over a dataset. The received data is then used, at 1220, tocalculate an expected membership advantage ρ_(α) that corresponds to alikelihood of an adversary successfully identifying a member in thedataset. Such calculating can be based on an overlap of two probabilitydistributions. The calculated expected membership advantage ρ_(α) can beused, at 1230, when applying the DP algorithm to a function over thedataset to result in an anonymized function output (e.g., a machinelearning model, etc.).

FIG. 13 is a process flow diagram 1300 in which, at 1310, data isreceived that specifies privacy parameters ε, δ which govern adifferential privacy (DP) algorithm to be applied to a function to beevaluated over a dataset. Thereafter, at 1320, the received data is usedto calculate an adversarial posterior belief bound ρ_(c) that ρ_(c)

to a function over the dataset to result in an anonymized functionoutput (e.g., machine learning model, etc.).

FIG. 14 is a process flow diagram in which, at 1410, a dataset isreceived. Thereafter, at 1420, at least one first user-generated privacyparameter is received which governs a differential privacy (DP)algorithm to be applied to a function evaluated over the receiveddataset. Using the received at least one first user-generated privacyparameter, at least one second privacy parameter is calculated, at 1430,based on a ratio or overlap of probabilities of distributions ofdifferent observations. Subsequently, at 1440, the DP algorithm isapplied, using the at least one second privacy parameter, to thefunction over the received dataset. At least one machine learning modelcan be trained, at 1450, using the dataset which, when deployed, isconfigured to classify input data.

The machine learning model(s) can be deployed once trained to classifyinput data when received.

The at least one first user-generated privacy parameter can include abound for an adversarial posterior belief ρ_(c) that corresponds to alikelihood to re-identify data points from the dataset based on adifferentially private function output. With such an arrangement, thecalculated at least one second privacy parameter can include privacyparameters ε, δ and the calculating can be based on ρ_(c) which arebound by the posterior belief ρ_(c) as applied to the dataset.

In another variation, the at least one first user-generated privacyparameter includes privacy parameters ε, δ. With such an implementation,the calculated at least one second privacy parameter can include anexpected membership advantage ρ_(α) that corresponds to a likelihood ofan adversary successfully identifying a member in the dataset and thecalculating can be based on an overlap of two probability distributions.

In still another variation, the at least one first user-generatedprivacy parameter can include privacy parameters ε, δ. With such animplementation, the calculated at least one second privacy parameter caninclude an adversarial posterior belief bound ρ_(c) that corresponds toa likelihood to re-identify data points from the dataset based on adifferentially private function output and the calculating can be basedon a conditional probability of different possible datasets.

FIG. 15 is a diagram 1500 illustrating a sample computing devicearchitecture for implementing various aspects described herein. A bus1504 can serve as the information highway interconnecting the otherillustrated components of the hardware. A processing system 1508 labeledCPU (central processing unit) (e.g., one or more computerprocessors/data processors at a given computer or at multiplecomputers), can perform calculations and logic operations required toexecute a program. A non-transitory processor-readable storage medium,such as read only memory (ROM) 1512 and random access memory (RAM) 1516,can be in communication with the processing system 1508 and can includeone or more programming instructions for the operations specified here.Optionally, program instructions can be stored on a non-transitorycomputer-readable storage medium such as a magnetic disk, optical disk,recordable memory device, flash memory, or other physical storagemedium.

In one example, a disk controller 1548 can interface with one or moreoptional disk drives to the system bus 1504. These disk drives can beexternal or internal floppy disk drives such as 1560, external orinternal CD-ROM, CD-R, CD-RW, or DVD, or solid state drives such as1552, or external or internal hard drives 1556. As indicated previously,these various disk drives 1552, 1556, 1560 and disk controllers areoptional devices. The system bus 1504 can also include at least onecommunication port 1520 to allow for communication with external deviceseither physically connected to the computing system or availableexternally through a wired or wireless network. In some cases, the atleast one communication port 1520 includes or otherwise comprises anetwork interface.

To provide for interaction with a user, the subject matter describedherein can be implemented on a computing device having a display device1540 (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display)monitor) for displaying information obtained from the bus 1504 via adisplay interface 1514 to the user and an input device 1532 such askeyboard and/or a pointing device (e.g., a mouse or a trackball) and/ora touchscreen by which the user can provide input to the computer. Otherkinds of input devices 1532 can be used to provide for interaction witha user as well; for example, feedback provided to the user can be anyform of sensory feedback (e.g., visual feedback, auditory feedback byway of a microphone 1536, or tactile feedback); and input from the usercan be received in any form, including acoustic, speech, or tactileinput. The input device 1532 and the microphone 1536 can be coupled toand convey information via the bus 1504 by way of an input deviceinterface 1528. Other computing devices, such as dedicated servers, canomit one or more of the display 1540 and display interface 1514, theinput device 1532, the microphone 1536, and input device interface 1528.

One or more aspects or features of the subject matter described hereincan be realized in digital electronic circuitry, integrated circuitry,specially designed application specific integrated circuits (ASICs),field programmable gate arrays (FPGAs) computer hardware, firmware,software, and/or combinations thereof. These various aspects or featurescan include implementation in one or more computer programs that areexecutable and/or interpretable on a programmable system including atleast one programmable processor, which can be special or generalpurpose, coupled to receive data and instructions from, and to transmitdata and instructions to, a storage system, at least one input device,and at least one output device. The programmable system or computingsystem may include clients and servers. A client and server aregenerally remote from each other and typically interact through acommunication network. The relationship of client and server arises byvirtue of computer programs running on the respective computers andhaving a client-server relationship to each other.

These computer programs, which can also be referred to as programs,software, software applications, applications, components, or code,include machine instructions for a programmable processor, and can beimplemented in a high-level procedural language, an object-orientedprogramming language, a functional programming language, a logicalprogramming language, and/or in assembly/machine language. As usedherein, the term “machine-readable medium” refers to any computerprogram product, apparatus and/or device, such as for example magneticdiscs, optical disks, memory, and Programmable Logic Devices (PLDs),used to provide machine instructions and/or data to a programmableprocessor, including a machine-readable medium that receives machineinstructions as a machine-readable signal. The term “machine-readablesignal” refers to any signal used to provide machine instructions and/ordata to a programmable processor. The machine-readable medium can storesuch machine instructions non-transitorily, such as for example as woulda non-transient solid-state memory or a magnetic hard drive or anyequivalent storage medium. The machine-readable medium can alternativelyor additionally store such machine instructions in a transient manner,such as for example as would a processor cache or other random accessmemory associated with one or more physical processor cores.

In the descriptions above and in the claims, phrases such as “at leastone of” or “one or more of” may occur followed by a conjunctive list ofelements or features. The term “and/or” may also occur in a list of twoor more elements or features. Unless otherwise implicitly or explicitlycontradicted by the context in which it is used, such a phrase isintended to mean any of the listed elements or features individually orany of the recited elements or features in combination with any of theother recited elements or features. For example, the phrases “at leastone of A and B;” “one or more of A and B;” and “A and/or B” are eachintended to mean “A alone, B alone, or A and B together.” A similarinterpretation is also intended for lists including three or more items.For example, the phrases “at least one of A, B, and C;” “one or more ofA, B, and C;” and “A, B, and/or C” are each intended to mean “A alone, Balone, C alone, A and B together, A and C together, B and C together, orA and B and C together.” In addition, use of the term “based on,” aboveand in the claims is intended to mean, “based at least in part on,” suchthat an unrecited feature or element is also permissible.

The subject matter described herein can be embodied in systems,apparatus, methods, and/or articles depending on the desiredconfiguration. The implementations set forth in the foregoingdescription do not represent all implementations consistent with thesubject matter described herein. Instead, they are merely some examplesconsistent with aspects related to the described subject matter.Although a few variations have been described in detail above, othermodifications or additions are possible. In particular, further featuresand/or variations can be provided in addition to those set forth herein.For example, the implementations described above can be directed tovarious combinations and subcombinations of the disclosed featuresand/or combinations and subcombinations of several further featuresdisclosed above. In addition, the logic flows depicted in theaccompanying figures and/or described herein do not necessarily requirethe particular order shown, or sequential order, to achieve desirableresults. Other implementations may be within the scope of the followingclaims.

What is claimed is:
 1. A computer-implemented method for anonymizedanalysis of datasets comprising: receiving data specifying a bound foran adversarial posterior belief ρ_(c) that corresponds to a likelihoodto re-identify data points from a dataset based on a differentiallyprivate function output; calculating, based on the received data,privacy parameters ε, δ which govern a differential privacy (DP)algorithm to be applied to a function to be evaluated over a dataset,the calculating being based on a ratio of probabilities distributions ofdifferent observations which are bound by the posterior belief ρ_(c) asapplied to the dataset; and applying, using the calculated privacyparameters ε, δ, the DP algorithm to the function over the dataset. 2.The method of claim 1, wherein the probability distributions aregenerated using a Gaussian mechanism with an (ε, δ) guarantee thatperturbs the result of the function evaluated over the dataset,preventing a posterior belief greater than ρ_(c) on the dataset.
 3. Themethod of claim 1, wherein the probability distributions are generatedusing a Laplacian mechanism with an ε guarantee that perturbs the resultof the function evaluated over the dataset, preventing a posteriorbelief greater than ρ_(c) on the dataset.
 4. The method of claim 1further comprising: anonymously training at least one machine learningmodel using the dataset after application of the DP algorithm to thefunction over the dataset.
 5. The method of claim 4 further comprising:deploying the trained at least one machine learning model to classifyfurther data input into the at least one machine learning model.
 6. Themethod of claim 1, wherein ϵ=log(ρ_(c)/(1−ρ_(c)) for a series of (ε, δ)or ε anonymized function evaluations with multidimensional data.
 7. Themethod of claim 4 further comprising: calculating a resulting totalposterior belief ρ_(c) using a sequential composition or Rényidifferential privacy (RDP) composition; and updating the at least onemachine learning model using the calculated resulting total posteriorbelief ρ_(c).
 8. A computer-implemented method for anonymized analysisof datasets comprising: receiving data specifying privacy parameters ε,δ which govern a differential privacy (DP) algorithm to be applied to afunction to be evaluated over a dataset; calculating, based on thereceived data, an expected membership advantage ρ_(α) that correspondsto a likelihood of an adversary successfully identifying a member in thedataset, the calculating being based on an overlap of two probabilitydistributions; applying, using the calculated expected membershipadvantage ρ_(α), the DP algorithm to a function over the dataset.
 9. Themethod of claim 8, wherein the probability distributions are generatedusing a Gaussian mechanism with an (ε, δ) guarantee that perturbs theresult of the function evaluated over the dataset, ensuring thatmembership advantage is ρ_(α) on the dataset.
 10. The method of claim 8further comprising: anonymously training at least one machine learningmodel using the dataset after application of the DP algorithm to thefunction over the dataset.
 11. The method of claim 10 furthercomprising: deploying the trained at least one machine learning model toclassify further data input into the at least one machine learningmodel.
 12. The method of claim 8, wherein the calculated expectedmembership advantage ρ_(α) for a series of (ε, δ) anonymized functionevaluations with multidimensional data is equal to:${{CDF}\left( \frac{1}{\frac{2\sqrt{2{\ln\left( \frac{125}{\delta} \right)}}}{\epsilon}} \right)} - {{CDF}\left( \frac{- 1}{\frac{2\sqrt{2{\ln\left( \frac{125}{\delta} \right)}}}{\epsilon}} \right)}$wherein CDF is the cumulative distribution function of the standardnormal distribution.
 13. The method of claim 11 further comprising:calculating a resulting expected membership advantage ρ_(α) usingsequential composition or Rényi differential privacy (RDP) composition;updating the at least one machine learning model using the calculatedresulting expected membership advantage ρ_(α).
 14. A system for traininga machine learning model comprising: at least one data processor; memorystoring instructions which, when executed by the at least one dataprocessor, result in operations comprising: receiving a dataset;receiving at least one first user-generated privacy parameter whichgoverns a differential privacy (DP) algorithm to be applied to afunction evaluated over the received dataset; calculating, based on thereceived at least one first user-generated privacy parameter, at leastone second privacy parameter based on a ratio or overlap ofprobabilities of distributions of different observations; applying,using the at least one second privacy parameter, the DP algorithm to thefunction over the received dataset to result in an anonymized functionoutput; and anonymously training at least one machine learning modelusing the dataset after application of the DP algorithm to the functionover the received dataset which, when deployed, is configured toclassify input data.
 15. The system of claim 14, wherein the operationsfurther comprise: deploying the trained at least one machine learningmodel; receiving, by the deployed trained at least one machine learningmodel, input data.
 16. The system of claim 15, wherein the operationsfurther comprise: providing, by the deployed trained at least onemachine learning model based on the input data, a classification. 17.The system of claim 14, wherein: the at least one first user-generatedprivacy parameter comprises a bound for an adversarial posterior beliefρ_(c) that corresponds to a likelihood to re-identify data points fromthe dataset based on a differentially private function output; and thecalculated at least one second privacy parameter comprises privacyparameters ε, δ; and the calculating is based on the conditionalprobability of distributions of different datasets given a differentialprivate function output which are bound by the posterior belief ρ_(C) asapplied to the dataset.
 18. The system of claim 14, wherein the at leastone first user-generated privacy parameter comprises privacy parametersε, δ; the calculated at least one second privacy parameter comprises anexpected membership advantage ρ_(α) that corresponds to a probability ofan adversary successfully identifying a member in the dataset; and thecalculating is based on a conditional probability of different possibledatasets.
 19. The system of claim 14, wherein the at least one firstuser-generated privacy parameter comprises privacy parameters ε, δ; thecalculated at least one second privacy parameter comprises anadversarial posterior belief bound ρ_(c) that corresponds to alikelihood to re-identify data points from the dataset based on adifferentially private output.
 20. The system of claim 19, wherein thecalculating is based on a conditional probability of different possibledatasets.